Similarity Search in High-Dimensional Data Spaces
نویسنده
چکیده
This paper summarizes analytical and experimental results for the nearest neighbor similarity search problem in high-dimensional vector spaces using some kind of space-or data-partitioning scheme. Under the assumptions of uniformity and independence of data, we are able to formally show and to demonstrate that conventional approaches to the nearest neighbor problem degenerate if the dimensionality of the data space becomes large. Given the experimental results, we recommend to use scan based algorithms for nearest neighbor search whenever the dimensionality is larger than around 5.
منابع مشابه
Retrieval of Optimal Subspace Clusters Set for an Effective Similarity Search in a High-Dimensional Spaces
High dimensional data is often analysed resorting to its distribution properties in subspaces. Subspace clustering is a powerfull method for elicication of high dimensional data features. The result of subspace clustering can be an essential base for building indexing structures and further data search. However, a high number of subspaces and data instances can conceal a high number of subspace...
متن کاملThe Theory and Practice of Similarity Searches in High Dimensional Data Spaces
Similarity search in multimedia databases is typically performed on abstractions of multimedia objects, also called the features, rather than on the objects themselves. Though the feature extraction process is application speci c, the resulting features are most often considered as points in high-dimensional vector spaces (e.g. the color indexing method of Stricker and Orengo [SO95]). Similarit...
متن کاملSPY-TEC: An efficient indexing method for similarity search in high-dimensional data spaces
Most of all index structures based on the R-tree have failed to support ecient indexing mechanisms for similarity search in high-dimensional data spaces. This is due to the fact that most of the index structures commonly use balanced split strategy in order to guarantee storage utilization and the shape of queries for similarity search is a hypersphere in high-dimensional spaces. In this paper...
متن کاملیک روش مبتنی بر خوشهبندی سلسلهمراتبی تقسیمکننده جهت شاخصگذاری اطلاعات تصویری
It is conventional to use multi-dimensional indexing structures to accelerate search operations in content-based image retrieval systems. Many efforts have been done in order to develop multi-dimensional indexing structures so far. In most practical applications of image retrieval, high-dimensional feature vectors are required, but current multi-dimensional indexing structures lose their effici...
متن کاملThe Hybrid Tree: An Index Structure for High Dimensional Feature Spaces
Feature based similarity search is emerging as an important search paradigm in database systems. The technique used is to map the data items as points into a high dimensional feature space which is indexed using a multidimensional data structure. Similarity search then corresponds to a range search over the data structure. Although several data structures have been proposed for feature indexing...
متن کامل